feat: add code source references and related page tagging; emits Rela…#832
Merged
Merged
Conversation
…ted/Implementation blocks into /<slug>/index.md and /llms-full.txt, and JSON-LD TechArticle keywords + relatedLink into <head>; No visible change; all headless updates
Contributor
Documentation Preview ReadyYour documentation preview has been successfully deployed! Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-cms-832/docs/user-guide/quickstart/overview/ Updated at: 2026-05-14T17:32:42.175Z |
…er facing page; add title uniqueness test
zastrowm
reviewed
May 13, 2026
Replaces the earlier raw-overlap algorithm with a specificity-weighted
Jaccard scorer plus a confidence floor for the human-facing surface.
Algorithm (src/util/related-docs.ts):
- Score is rarity-weighted Jaccard: each shared tag's contribution scales
with its rarity in the corpus (1 - freq/N), so a shared `bedrock` (8/108)
outweighs a shared `aws` (14/108). Self-correcting if a tag bloats over
time — its weight automatically drops.
- Headless surface (/<slug>/index.md, /llms-full.txt, JSON-LD relatedLink):
top 10, no floor. LLMs benefit from wider recall and self-filter.
- Human surface ("See also" pill row): top 6 with score >= 0.4 floor.
Empty strip is strictly better than a misleading one; the floor catches
the spurious 1-broad-tag matches that earlier audits kept finding by
hand.
- Specificity table memoized on the input array (cheap WeakMap hit).
Vocabulary (src/config/tags.yml):
- 32-tag registry organized by axis (topics / capabilities / lifecycle).
- Inline rule and authoring checklist at the head of the YAML — "tag =
teaches, not mentions" with concrete failure-pattern examples drawn
from earlier audit rounds.
- Build-time Zod validation (src/config/tags.ts); unknown tags fail.
Content:
- 108 of 123 user-guide pages tagged; 15 deliberately untagged
(umbrella/policy pages and known structural cases).
- Two audit passes done; broad-tag misuses (production, aws on
session-management, tool-execution on model providers, etc.) stripped.
Surface design (src/components/RelatedPagesInline.astro):
- Outlined pills wrapping to fit content. Small "Related" caption above.
- Mounted in MarkdownContent.astro above Starlight's Prev/Next pagination.
- No surface-aware filtering — algorithm is unaware of Pagination.
Lint:
- Title-uniqueness check in test/content-collection.test.ts catches
cross-section title collisions (the kind that produce ambiguous
"Hooks" pills).
311 tests pass; 14 cover the algorithm directly.
…t labels
- Drop "right outcome" / "to bring those pages back" prose from
HUMAN_SCORE_FLOOR comment; keep the calibration note and a one-line
hint to add sharper tags.
- Update describe/it labels in the headless test suite from "Jaccard" /
"tag-overlap size" to the current language ("specificity-weighted
Jaccard" / "score") so test output reflects the actual algorithm.
No behavior change. 14 tests still pass.
The visible pill row at the bottom of user-guide articles is removed. Tag-driven Related Pages remains in the headless surfaces: - `## Related pages` block in /<slug>/index.md and /llms-full.txt - `relatedLink` array in <head> JSON-LD `TechArticle` What this removes: - src/components/RelatedPagesInline.astro (deleted) - humanRelatedUserGuideFor() and its tests - HUMAN_MAX, HUMAN_SCORE_FLOOR constants and the floor docblock - import + render from MarkdownContent.astro Tag authoring rule (src/config/tags.yml header) updated to reflect that the surface is now headless-only. 306 tests pass; build clean; HTML body verified to contain zero traces of the removed strip.
zastrowm
approved these changes
May 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Tag-driven Related Pages
note this change does not change any human facing pages. It just adds related pages and source reference metadata for agents and headless browsers. As a follow up we /may/ add tags at the bottom of pages that group other related pages.
sourceLinksinfrastructure (schema, renderer for## Implementationblocks) is wired but no values are set — pending the upcoming monorepo migration. Re-adoption is purely a frontmatter exercise.Surfaces
## Related pagesblock/<slug>/index.mdand aggregated in/llms-full.txtllms.txtconsumersrelatedLinkTechArticlegraph nodeAlgorithm
Score is rarity-weighted Jaccard:
Each tag's rarity-weight is
1 - (pages_with_tag / total_tagged_pages). Common tags (e.g.aws, used by 14 pages) contribute near 0; rare tags (e.g.agentcore, used by 4) contribute near 1.A 1-tag match on a broad tag ranks below any 2-tag specific match, so coincidental connections sink naturally. A tag that bloats over time loses weight automatically — no manual audit cycle needed to react. Ties break alphabetically by title (deterministic).
Tag registry
src/config/tags.yml— 32 tags grouped by axis (topics / capabilities / lifecycle), validated by Zod at build time. The YAML header documents the rule:…with a four-trap checklist drawn from concrete failure modes encountered while tagging the corpus.
Coverage
112 of 123 user-guide pages tagged. 11 are deliberately untagged (umbrella pages, policy docs, pages with no good cross-section bridge).
Concrete output for
safety-security/guardrailsSource frontmatter:
/docs/user-guide/safety-security/guardrails/index.mdand the same block inside/llms-full.txtTop 10, ranked by score, no floor:
Links are emitted as
index.mdsiblings so an LLM following them stays on the markdown surface.<head>JSON-LDTechArticle.relatedLinkSame 10 entries, same ordering, but as canonical HTML URLs:
{ "@type": "TechArticle", "headline": "Guardrails", "keywords": "safety, bedrock, aws", "relatedLink": [ "https://strandsagents.com/docs/user-guide/concepts/model-providers/amazon-bedrock/", "https://strandsagents.com/docs/user-guide/concepts/model-providers/amazon-nova/", /* ... */ ] }HTML "See also" pills
Top 6 from the same ranking, filtered to score ≥ 0.4. On Guardrails this lands as 6 pills; on a thin-tag page (e.g.
retry-strategieswhose best score is 0.34) the strip is empty, by design.Safety nets
test/content-collection.test.ts— fails build on cross-section title collisions (the "two pages named Hooks" failure mode that produces ambiguous pills).test/related-docs.test.tscovering scoring, tie-breaking, score floor, edge cases. 311 tests total in the repo.Verification
npm run build— clean, no broken links across 515 pagesnpm run typecheck— cleannpm test— 311 passType of Change
Checklist
npm run devBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.